The Importance of Parameter Tuning in Text Categorization

نویسندگان

  • K Koster
  • Kun
  • Kimmo Rossi
چکیده

Text Categorization algorithms have a large number of parameters that determine their behaviour, whose effect is not easily predicted objectively or intuitively and may very well depend on the corpus or on the document representation. Their values are usually taken over from previously published results. In this article we investigate the effect of parameter tuning on the accuracy of two Text Categorization algorithms: the well-known Rocchio algorithm and the lesser-known Winnow. It turns out that the optimal parameter values are sometimes very different from those found in literature. We show that parameter tuning can greatly improve the accuracy of both algorithms, much more so than Term Selection. We also show that a non-Euclidian variant of the Vector Space Model may improve the accuracy but that at optimal parameter values this effect practically disappears.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Importance of Parameter Tuning in Text Categorization

Text Categorization algorithms have a large number of parameters that determine their behaviour, whose effect is not easily predicted objectively or intuitively and may very well depend on the corpus or on the document representation. Their values are usually taken over from previously published results, which may lead to less than optimal accuracy in experimenting on particular corpora. In thi...

متن کامل

Efficient and Robust Parameter Tuning for Heuristic Algorithms

The main advantage of heuristic or metaheuristic algorithms compared to exact optimization methods is their ability in handling large-scale instances within a reasonable time, albeit at the expense of losing a guarantee for achieving the optimal solution. Therefore, metaheuristic techniques are appropriate choices for solving NP-hard problems to near optimality. Since the parameters of heuristi...

متن کامل

Text Categorization Problem

Document categorization problem gained a lot of importance in the last years due to the increase in the number of digital documents. This paper analyzes the performance of different classification algorithms on text categorization problem. Importance of parameter optimization on the performance of the algorithms is also discussed. The paper mostly focuses on the SVM (Support Vector Machines) al...

متن کامل

Universit at Dortmund Fachbereich Informatik Lehrstuhl Viii K Unstliche Intelligenz Text Categorization with Support Vector Machines: Learning with Many Relevant Features Text Categorization with Support Vector Machines: Learning with Many Relevant Features

This paper explores the use of Support Vector Machines (SVMs) for learning text classiers from examples. It analyzes the particular properties of learning with text data and identi es, why SVMs are appropriate for this task. Empirical results support the theoretical ndings. SVMs achieve substantial improvements over the currently best performing methods and they behave robustly over a variety o...

متن کامل

Text Categorization with Support Vector Machines: Learning with Many Relevant F Eatures Text Categorization with Support Vector Machines: Learning with Many Relevant F Eatures

This paper explores the use of Support Vector Machines (SVMs) for learning text classiers from examples. It analyzes the particular properties of learning with text data and identi es, why SVMs are appropriate for this task. Empirical results support the theoretical ndings. SVMs achieve substantial improvements over the currently best performing methods and they behave robustly over a variety o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003